The Columbia River is a 2000km river that runs from British Columbia through Washington and Oregon. The Columbia Basin Research center runs the DART program (Data Access in Real Time), part of which collects data on fish passage through dams along the river. The Bonneville Dam is one such dam, and the steelhead trout is one such fish.

Here’s an image of the Bonneville Dam!

(Photo by Duane Raver)

And here’s an image of a steelhead trout!

(Photo by Visitor7)

This project explores steelhead trout passage through the Bonneville Dam from 1939 to 2019. Primarily, it looks at temporal patterns of fish passage at the daily, seasonal, and annual level. Curious where the data comes from? Look no further!

Now let’s get into the code!

Load necessary packages and data

library(tidyverse)
library(janitor)
library(lubridate)
library(tsibble)
library(feasts)

raw_fish <- read_csv("cbr_fish_passage_bonneville_allyrs_steelhead.csv") %>% 
  clean_names()

Tidy the data

fish <- raw_fish %>%
  #drop_na(value) %>% #drop NA values
  #filter(value>=0) %>% #drop negative values because metadata doesn't explain what they mean
  separate(mm_dd, into=c("day", "month")) %>% #separate out month and day
  mutate(yr_mo_day = paste(year, match(month,month.abb), day, sep="-"), #combine all date columns and separate by -
         yr_mo = paste(year, match(month,month.abb), sep="-"),
         date = as.Date(yr_mo_day)) %>% #tell R this is a date
  filter(!is.na(date)) %>% #REMOVE DARN LEAP YEAR DATES (Feb 29th)
  mutate(year_month = tsibble::yearmonth(yr_mo_day)) %>% #tell R this is a date
  select("year", "year_month", "yr_mo", "date", "value") #remove unnecessary variables

Look at the data day-by-day (original time series observations)

fish_day <- ggplot(data=fish, aes(x=date, y=value)) +
  geom_line() +
  xlab("Date") +
  ylab("Passage counts") +
  theme_minimal()

fish_day

Fig 1. Daily counts of steelhead trout passage through the Bonneville dam from 1939 to 2019.

Trout counts seem to peak seasonally, which makes sense given their annual migration patterns.

Double check you did it right by looking at the first 1000 observations

fish_trimmed <- tail(fish, n=1000)

ggplot(data=fish_trimmed, aes(x=date, y=value)) +
  geom_line()

Look at the data month-by-month

fish_month <- ggplot(data=fish, aes(x=year_month, y=value)) +
  geom_line()

fish_month

Look at the data year-by-year

fish_yr <- ggplot(data=fish, aes(x=year, y=value)) +
  geom_line()

fish_yr

Make a season plot

#Coerce dataframe to a tsibble
fish_ts <- as_tsibble(fish, index= year_month)

duplicates(fish, index=year_month)
## # A tibble: 0 x 5
## # ... with 5 variables: year <dbl>, year_month <mth>, yr_mo <chr>, date <date>,
## #   value <dbl>
#plot tsibble fancy plots
fish_ts %>% autoplot(value)

fish_ts %>% gg_subseries(value) +
  #geom_line(aes(color=year)) +
  xlab("Date") +
  ylab("Passage counts") +
  theme_minimal()

Check out that sweet trout spawning season July-September! It appears that around 1950-1970, trout counts were down at times when we should’ve expected higher abundances. But those seem to be back to normal (and even showing historical highs around the 2000s).

#prep data frame for season plot
fish_summary <- fish %>% 
  group_by(yr_mo) %>%
  summarise(sums=sum(value)) %>% 
  mutate(year_month = tsibble::yearmonth(yr_mo), #tell R this is a date
         month = month(year_month, label = TRUE),
         year = year(year_month),
         sums=replace_na(sums, 0)) #replace NA values with 0

#coerce data frame into a tsibble
fish_summary_ts <- as_tsibble(fish_summary, index= year_month) %>% 
  fill_gaps() #fill gaps because some dates didn't take measurements

#plot season plot
fish_summary_ts %>% gg_season(sums)

#make the same season plot but now in ggplot
ggplot(data=fish_summary, aes(x=month, y=sums, group=year)) +
  geom_line(aes(color=year)) +
  scale_colour_gradientn(colours = c("red","turquoise","darkblue")) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + #remove scientific notation from y axis
  xlab("Month") +
  ylab("Passage counts") +
  theme_minimal()

Fig 2. Seasonal steelhead trout passage counts through the Bonneville Dam.

It appears that trout counts are generally greater from 1980-2020 than from 1940-1980. And, of course, there is a peak in counts during the spawning season from July to September.

Visualize annual steelhead passage counts

#prep data frame for annual counts plot
fish_annual <- fish_summary %>% 
  group_by(year) %>% 
  summarize(sumsofsums=sum(sums))

#plot it
ggplot(data=fish_annual, aes(x=year, y=sumsofsums)) +
  geom_line() +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + #remove scientific notation from y axis
  xlab("Date") +
  ylab("Passage counts") +
  theme_bw()

Fig 3. Annual steelhead trout passage counts through the Bonneville dam from 1939 to 2019.

From 1980 until 2010, it appears that counts were on the rise! But recently they’ve dropped off, hopefully the counts start going back up in the next few years…